Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cloud netconfig #19922

Merged
merged 2 commits into from
Aug 22, 2024
Merged

Cloud netconfig #19922

merged 2 commits into from
Aug 22, 2024

Conversation

pdostal
Copy link
Member

@pdostal pdostal commented Aug 9, 2024

There are 2 commits, first is just cleanup and second is improvements.
The new terraform file is just old azure.tf with two network interfaces each with primary public IP address and secondary private IP address.

This is not yet finish. I'll add more test and I'm not happy with the Terraform part. Similar testing should be also done in EC2 and GCE and I'd like to do it in one pull request as it won't differ much - only Terraform part will differ a lot.

If you have suggestion I'd appreciate those.

@pdostal pdostal removed the WIP Work in progress label Aug 13, 2024
Copy link
Member

@asmorodskyi asmorodskyi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Copy link
Contributor

@grisu48 grisu48 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall nice work. I left some improvement comments but overall this should work.

I'm not sure if we need an entirely new terraform profile, given the similarities with azure.tf 🤔

cmd => 'systemctl status cloud-netconfig.timer',
fail_message => 'cloud-netconfig.timer appears not to be started.'
);
$instance->ssh_assert_script_run('systemctl status cloud-netconfig.timer');
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
$instance->ssh_assert_script_run('systemctl status cloud-netconfig.timer');
$instance->ssh_assert_script_run('systemctl is-active cloud-netconfig.timer');

chomp($local_eth0_ip);
my $metadata_eth0_ip = $provider->query_metadata($instance, ifNum => '0', addrCount => '0');
die("Failed to get interface 0 IP from metadata server") unless length($metadata_eth0_ip);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
die("Failed to get interface 0 IP from metadata server") unless length($metadata_eth0_ip);
die("Failed to get interface IPs from metadata server") unless length($metadata_eth0_ip);

I would remove the interface numbers here, as they are confusing to anyone who has not the full code in their head. I think this makes a better error message, because it still explains what happens but without unnecessary (and confusing) interface indices.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I applied your suggestion and moved the check inside $provider->query_metadata().

} else {
die("Unsupported public cloud provider.");
if (is_azure) {
record_info('eth1', 'Get public IP address for eth1');
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this information would be better as a code comment instead of a information box.

die("Unsupported public cloud provider.");
if (is_azure) {
record_info('eth1', 'Get public IP address for eth1');
my $local_eth1_ip = $instance->ssh_script_output(qq(ip -4 -o a s eth1 primary | grep -Po "inet \\K[\\d.]+" | head -n1));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is head -n1 necessary here? 🤔

my $local_eth1_ip = $instance->ssh_script_output(qq(ip -4 -o a s eth1 primary | grep -Po "inet \\K[\\d.]+" | head -n1));
chomp($local_eth1_ip);
my $metadata_eth1_ip = $provider->query_metadata($instance, ifNum => '1', addrCount => '0');
die("Failed to get interface 1 IP from metadata server") unless length($metadata_eth1_ip);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
die("Failed to get interface 1 IP from metadata server") unless length($metadata_eth1_ip);
die("Failed to get interface IP from metadata server") unless length($metadata_eth1_ip);

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I applied your suggestion and moved the check inside $provider->query_metadata().

die('There are no ARP neighbors on eth0') if ($instance->ssh_script_output("ip neighbor show dev eth0 | wc -l") == 0);
die('There are no ARP neighbors on eth1') if ($instance->ssh_script_output("ip neighbor show dev eth1 | wc -l") == 0);

record_info('Local removal', 'Remove eth0 secondary address and eth1 default route and check if it reappears.');
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't see the point of this check for cloud_netconfig. This is testing something different IMHO and could be removed. What do you think? 🤔

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From https://github.com/SUSE-Enceladus/cloud-netconfig/blob/master/README.md:

Interface configurations will be checked periodically on each DHCP lease renewal and additionally, if the systemd timer is enabled, every 60 seconds. cloud-netconfig detects changes in the metadata configuration and updates interface configurations and routing policies accordingly. This means that IP addresses and ranges that were removed from the virtual interface configuration will be removed from the interface, but only addresses and ranges that were automatically added by cloud-netconfig will be removed. Addresses added manually by the administrator or by another tool (e.g. high-availability software) will not be touched.

From my understanding cloud-netconfig periodically checks if the interfaces and addresses defined in cloud metadata are present in the system. This checks that by removing some and checking if they got reapplied.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see, yes then it makes sense 👍

$instance->ssh_assert_script_run("sudo ip route del default dev eth1");
record_info('debug', $instance->ssh_script_output('ip addr show; ip route show'));

sleep(90);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I replaced it with systemctl start cloud-netconfig.service so please recheck

my $nic_name = script_output(qq(az network nic list --resource-group $resource_group | jq -r '.[]|select(.primary==false).name'));
my $ipConfig_name = script_output(qq(az network nic list --resource-group $resource_group | jq -r '.[]|select(.primary==false).ipConfigurations[]|select(.privateIPAddress=="$local_eth1_secondary_ip").name'));
assert_script_run("az network nic ip-config delete -g $resource_group -n $ipConfig_name --nic-name $nic_name");
sleep(90);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please add a comment explaining why the sleep is needed.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I replaced it with systemctl start cloud-netconfig.service so please recheck

assert_script_run("az network nic ip-config delete -g $resource_group -n $ipConfig_name --nic-name $nic_name");
sleep(90);
record_info('debug', $instance->ssh_script_output('ip addr show; ip route show'));
die('There is secondary IP address in eth1 altho it has been removed in the provider') if ($instance->ssh_script_run(qq(ip -4 -o a s dev eth1 secondary | grep -Po "inet \\K[\\d.]+" | wc -l)) != 0);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
die('There is secondary IP address in eth1 altho it has been removed in the provider') if ($instance->ssh_script_run(qq(ip -4 -o a s dev eth1 secondary | grep -Po "inet \\K[\\d.]+" | wc -l)) != 0);
die('Secondary IP address in eth1 still present after removed via provider') if ($instance->ssh_script_run(qq(ip -4 -o a s dev eth1 secondary | grep -Po "inet \\K[\\d.]+" | wc -l)) != 0);

my $instance = shift;
record_info('DEBUG');

$instance->ssh_script_run("ip -j a s");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Necessary if the following commands print exactly the same (just not as json)? 🤔

@pdostal pdostal force-pushed the cloud_netconfig branch 2 times, most recently from 762d973 to 0757bc7 Compare August 21, 2024 12:48
@pdostal
Copy link
Member Author

pdostal commented Aug 21, 2024

Build20240820-1 of sle-15-SP6-Azure-BYOS-Updates.aarch64 publiccloud_cloud_netconfig@64bit 4 17 minutes ago
Build20240820-1 of sle-15-SP6-EC2-BYOS-Updates.aarch64 publiccloud_cloud_netconfig@64bit 4 22 minutes ago
Build20240820-1 of sle-15-SP6-EC2-Updates.x86_64 publiccloud_cloud_netconfig@64bit 4 22 minutes ago
Build20240820-1 of sle-15-SP6-GCE-Updates.aarch64 publiccloud_cloud_netconfig@64bit 4 23 minutes ago
Build20240820-1 of sle-15-SP6-GCE-BYOS-Updates.aarch64 publiccloud_cloud_netconfig@64bit 4 23 minutes ago
Build20240820-1 of sle-12-SP5-Azure-BYOS-Updates.x86_64 publiccloud_cloud_netconfig@64bit 4 26 minutes ago
Build20240820-1 of sle-15-SP6-Azure-Standard-Updates.x86_64 publiccloud_cloud_netconfig@64bit 4 29 minutes ago
Build20240820-1 of sle-15-SP6-Azure-Basic-Updates.x86_64 publiccloud_cloud_netconfig@64bit 4 30 minutes ago
Build20240820-1 of sle-15-SP6-Azure-BYOS-Updates.x86_64 publiccloud_cloud_netconfig@64bit 4 32 minutes ago

default = "gen2"
}

variable "extra-disk-size" {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if we creating dedicated profile we may cleaned up from things which are not needed for this certain test . we won't add extra disk here so let's drop quite some code from it

default = false
}

variable "cloud_init" {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same here this variable is needed by another test and if it is test suite specific profile we can drop this

Copy link
Member

@asmorodskyi asmorodskyi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

last comments from me is just nice to have , feel free to ignore them . LGTM

@asmorodskyi asmorodskyi merged commit 4251800 into os-autoinst:master Aug 22, 2024
11 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants